Statistical analysis of living habits of individual persons and households
Jasmine Lundgren
Ha Do
Roosa Muranen
Authors
Load data
We'll start the data analysis process by importing necessary libraries and loading the data. We will only import the columns which are examined
in the analysis and we also will replace question mark values by NaN values.
household ID member ID Day of week Sex Living environment Age group Cooking Washing dishes Listening to radio Phonecall Museum Library
0 50007 2 2 2 3.0 6 0 20 0 0 2.0 1.0
1 50009 1 1 2 1.0 7 40 0 0 0 2.0 1.0
2 50015 1 1 1 3.0 8 10 0 10 0 2.0 1.0
3 50032 2 1 1 2.0 8 0 10 0 0 2.0 2.0
4 50033 1 1 2 1.0 8 02:10 00:20 00:00 00:00 2.0 2.0
# Import libaries are needed for data processing
import pandas as pd
import numpy as np
import scipy.stats as s
from sklearn.preprocessing import StandardScaler
from kmodes.kprototypes import KPrototypes
import matplotlib.pyplot as plt
import seaborn as sns
import re
import statsmodels.stats as ss
import statsmodels.stats.multitest as ssm
#Load and read dataset from a file called habits.data
file_path = 'habits.data'
data = pd.read_csv(file_path, sep=';', skiprows=1, usecols=[0, 1, 2, 3, 4, 5, 7, 8, 16, 18, 21, 22], names=['household ID', 'member ID', 'Day of week', 'Sex', 'Living environment', 'Age group',
'Cooking', 'Washing dishes', 'Listening to radio', 'Phonecall',
'Museum', 'Library'], na_values=['?'])
data.head()
11/1/24, 12:01 AM Project002.ipynb - Colab
https://colab.research.google.com/drive/1yypWPF1JaeBPrBM8L4KWQltdHLVHi_vQ#printMode=true 1/36